In this project, our team aims to explore the relationship between COVID-19 and other causes of deaths within the past 2 years. By combining different data sets and analyzing them, we can analyze different characteristics of deaths due to COVID-19 disease, identify who is at higher risk of deaths, and focus on certain resources that minimize the risk of deaths or exposure to COVID-19 disease. To be specific, here are the 3 questions we hope to answer throughout the project:
Where does COVID-19 rank related to other long-term causes grouped by specific factors such as location, age group, gender?
Is there any correlations present between COVID-19 deaths rate and deaths caused by respiratory illnesses? How is the trend across different groups?
How is the trend of COVID-19 deaths related to deaths caused by other groups over time?
In this exploratory report, we will mainly focus on visually and statistically assessing the distribution and relationship of various variables of our interest, which will be discussed below. As a result, we will use different statistical measures such as means, medians, and correlations, as well as different visualization tools including ggplot, plotly, and dygraphs.
In this section, we are going to investigate the distribution of the weekly death count due to natural cause, as well as the death count due to COVID-19 across gender. For the first plot, we are using data from the Weekly Provisional Counts of Deaths data, while for the second plot, we are using the Provisional COVID-19 Deaths data set. By investigating and comparing the distribution of death count, we will be able to statistically measure the total death count from the given data sets. This means getting the mean, median, minimum, and maximum value of a given metrics. Other than that, we can also analyze if there are discrepancies in the death count distribution across different metrics, which is gender in this case. The 2 metrics that we will use throughout this section are shown below,:
Natural.Cause: Death counts due to natural causeCovid.19.Deaths: Death counts due to COVID-19In the first plot, we will look into how the weekly natural death count distribution. As described above, this means statistically investigating how the death count due to natural causes is distributed across the data set. As a result, plotting a box plot will be the most effective visualization for this.
| Quantile Value | |
|---|---|
| 0% | 37624.00 |
| 25% | 54047.75 |
| 50% | 57051.00 |
| 75% | 63255.50 |
| 100% | 81651.00 |
Additionally, we also found out that the mean of death count is 59456.472. Looking at the box plot, we can conclude that the weekly natural death count is right skewed because it is apparent that most of the area in the box is to the right of the median. This is also equivalent to saying that the weekly natural death count has a longer distribution in the right tail.
In this section, we are going to investigate the density of the log distribution of death counts due to COVID-19 across gender. The reason why we decided to plot the log of the death counts is because the given data is exponential. The plot below shows how the distribution differs between the 2 genders:
Similarly to the first plot, we also use different statistical measures to compare the mean, median, minimum, and maximum log of death count due to COVID-19 across the two genders. The table below sums up the calculation:
| Sex | Mean | Median | Minimum | Maximum |
|---|---|---|---|---|
| Female | 5.458237 | 5.342334 | 1.0986123 | 12.88270 |
| Male | 5.616668 | 5.480639 | 0.6931472 | 13.09611 |
From the above plot and table, we can note that the distribution of the death counts across the two genders are similar to one another. This is shown by the similar bell shaped curve of the log death counts.
In this section, we aim to analyze the relationship between total deaths and different causes of deaths, including COVID-19, Pneumonia, and Influenza. In doing so, we first investigate the evolution of total deaths count due to COVID-19 in order to get a deeper understanding of when COVID-19 had the highest impact on total deaths count. Secondly, we also investigate the trend of total death count due to various respiratory diseases found in the data set. Below are the metrics used in visualizing the data found in the Provisional COVID-19 Deaths data set:
Covid.19.Deaths: Deaths counts due to COVID-19Pneumonia.Deaths: Deaths counts due to PneumoniaInfluenza.Deaths: Deaths counts due to InfluenzaThe interactive evolution plot above shows the trend line of total deaths count due to COVID-19 from the February 2020 - January 2022. The total death count is grouped by month and it represents counts across all genders and ages in the United States. As we can see from the plot above, the peak of the total death count due to COVID-19 was in the month of January 2021, followed by April 2020 and September 2021. These 3 peaks can be associated with the outbreak of new variants of COVID-19, such as Delta and Omicron, in the past 2 years. For instance, the Delta variant SARS-CoV-2 was first detected in India in October 2020, which was then quickly surged in December 2021. Many sources mentioned that the associated symptoms of Delta variant are much worse compared to the first and Omicron variant. As a result, it makes sense for the plot to show that the highest peak of total deaths related to COVID-19 was during January of 2021. For future research and analysis, it would be great to deep dive into the break down of total deaths count due to each COVID-19 variant, assuming we have the data.
In the second interactive relationship chart, we are comparing the total deaths count due to different respiratory diseases, including Pneumonia, Influenza, and COVID-19. As we can see from the above plot, the bar charts across the various causes appear to show similar trends over time. However, it is worth noting that for the last 1 year, COVID-19 related total deaths count dominates Pneumonia related total deaths count only during the outbreak of the COVID-19 Omicron variant. This is shown by the higher height of the COVID-19 bar charts compared to the Pneumonia bar charts. This is indeed an interesting finding because it may inform us that the impact of COVID-19 disease on deaths is not as severe as previous year (2020) and we can expect the severity of Omicron variant is less than the other previous variants of COVID-19.